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1  Introduction 

In  this  final  report,  we  deseribe  the  results  of  our  researeh  on  distinguishing  novel  usage  from 
novel  attacks  in  host-based  intrusion  detection  systems. 

1,1  Background 

Intrusion  detection  is  vital  to  the  security  of  mission-critical  Army  computer  systems.  It  is 
especially  important  in  today’s  increasingly  wireless  environment,  which  provides  many  more 
opportunities  for  malicious  intruders  to  penetrate  Army  networks  and  do  a  great  deal  of  damage 
as  apparent  “insiders.” 

Today’s  intrusion  detection  systems  (IDSs)  fall  into  two  types.  Signature-based  IDSs  maintain 
profiles  of  aberrant  behavior  and  raise  an  alarm  when  such  behavior  occurs;  they  cannot  detect 
novel  attacks.  Anomaly-based  IDSs  maintain  profiles  of  normal  behavior  and  raise  an  alarm 
when  anything  else  occurs.  However,  they  often  generate  false  alarms.  Like  signature-based 
IDSs,  anomaly-based  IDSs  focus  on  the  problem  of  the  moment:  does  this  behavior  indicate  an 
intrusion?  In  many  cases,  the  honest  answer  is,  “I  don’t  know.”  The  event  may  be  part  of  an 
intrusion,  but  it  might  simply  be  rare  or  novel  behavior,  perhaps  caused  by  a  new  software 
release  or  a  change  in  communication  patterns.  Current  IDSs  either  sound  the  alarm — leading 
to  a  high  false  alarm  rate  and  decreased  confidence  in  the  IDS — or  quietly  forget  the  possible 
intrusion. 

The  focus  of  this  project  is  improving  host-based  anomaly  detection  techniques.  The  original 
context  of  the  work  was  the  development  of  hybrid  behavior  profiles  that  include  both  normal 
and  aberrant  behavior.  Such  profiles  would  work  as  shown  in  Figure  1 .  Novel  behaviors  would 
be  cached  and  information  collected  about  the  behavior,  so  that  the  IDS  could  reason  about  the 
behavior,  eventually  resulting  in  the  characterization  of  a  given  behavior  as  either  normal  or 
aberrant/malicious. 


Figure  1,  Triage  of  behaviors 


The  system  that  we  originally  selected  as  the  concrete  basis  for  study  is  process  behavior  as 
characterized  by  sequences  of  kernel  calls  [Forrest96b,  Hofmeyr98].  The  Forrest  group  at  the 
University  of  New  Mexico  showed  that  short  sequences  of  calls  made  by  a  running  program  to 
the  operating  system  kernel  can  be  used  to  detect  attacks.  Forrest  observed  that  a  program  can 
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be  characterized  by  the  set  of  traces  of  kernel  calls  that  result  from  executing  the  program.  A 
priori,  it  is  not  obvious  how  to  represent  the  (usually  infinite)  set  of  possible  traces,  since  loops 
and  conditionals  can  cause  each  invocation  of  a  program  to  lead  to  a  different  trace.  The  Forrest 
group  devised  a  simple  but  ingenious  way  to  characterize  the  set  of  possible  traces;  they  use  the 
set  of  N-grams  produced  by  sliding  a  window  of  length  N  along  a  trace  of  a  program  process. 

An  N-gram  is  a  string  of  N  symbols,  each  of  which  corresponds  to  a  kernel  call  such  as  open.  If 
the  program  is  run  many  times  during  a  training  period,  then  the  result  set  of  N-grams  includes 
most  of  those  that  will  ever  appear  in  normal  traces. 

The  use  of  kernel  call  traces  was  a  major  breakthrough  in  anomaly  detection  and  for  many  years 
has  provided  the  best  characterization  of  program  behavior.  Many  alternative  detection 
algorithms  have  been  based  on  the  same  data  and  derive  from  the  N-gram  approach,  including 
[Debar98,  Ghosh99,  KruegelOS,  Lee98,  Lee99b,  MarceauOO,  Michael02,  SekarOl]. 

1.2  Shared  library  (DLL)  profiles 

In  the  course  of  this  research,  while  looking  for  data  streams  that  would  support  reasoning  about 
anomalies,  we  discovered  a  novel  way  of  characterizing  program  behavior  that  has  numerous 
advantages  over  previous  approaches.  This  novel  approach  provides  an  alternative  to  the 
kernel-call  data  stream.  It  is  based  on  tracing  calls  between  independently  loaded  executables 
and  libraries.  Like  the  work  on  gray-box  anomaly  detection  [Gao04a,  Gao04b]  at  Carnegie 
Mellon  University,  it  exploits  call  and  return  sequences  from  the  application  program  through 
various  libraries  and  functions  down  to  the  OS  kernel.  However,  our  purpose  is  to  factor 
profiles  and  provide  a  rich  new  source  of  data  for  analyzing  anomalies  in  a  hybrid  IDS.  In 
addition  to  supporting  reasoning  about  anomalies,  the  new  method  provides  a  more  precise  “fit” 
to  actual  program  behavior,  which  gives  it  the  potential  to  reduce  false  positives  (false  alarms) 
and  false  negatives  (missed  attacks)  simultaneously.  The  new  method,  which  we  call  “shared 
library  profiles”  or  “DLL  profiles,”  has  numerous  advantages  over  kernel-call  profiles.  It  is 
discussed  in  the  attached  paper. 

1.3  Summary  of  progress 

In  the  course  of  this  contract,  we  have 

•  Developed  a  novel  family  of  methods  for  detecting  anomalies  in  program  execution, 
called  “shared  library  profiling” 

•  Demonstrated  superior  intrusion  detection  qualities  of  shared  library  profile  techniques 

•  Demonstrated  that  the  performance  penalty  of  shared  library  profiling  is  acceptable 

•  Developed  a  novel  method  of  instrumenting  applications,  called  “cascading  wrappers,” 
which  is  able  to  capture  the  data  stream  for  shared  library  profiling 

•  Submitted  a  paper  on  DLL  profiles  to  ICICS 
These  results  are  summarized  in  the  following  sections. 
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2  Shared  library  (DLL)  profiles 

A  Windows  process  comprises  multiple  (kernel-supported)  threads,  some  of  whieh  are 
dedieated  to  GUI  or  system  functions.  Windows  applications  make  extensive  use  of  DLLs  that 
implement  the  operating  system  and  supply  additional  funetionality.  The  Windows  kernel  API 
is  defined  by  ntdll.dll.  However,  Windows  applications  rarely  eall  ntdll.dll  direetly.  Indeed,  the 
Microsoft  Visual  Studio  development  environment  does  not  support  calls  to  ntdll.dll.  Instead, 
kemel32.dll'  defines  the  standard  interfaee  to  the  operating  system,  although  a  few  DLLs  eall 
ntdll  directly.  Many  calls  to  kernel32  are  mediated  through  higher-level  DLLs.  As  a  result,  the 
typical  application  call  causes  multiple  calls  through  layers  of  DLLs  that  result  in  possibly  many 
calls  to  ntdll  and  the  kernel. 

Because  calls  to  operations  in  ntdll  typieally  originate  in  other  Windows  DLLs,  not  in  the 
application,  much  of  the  information  in  kernel-eall  traces  characterizes  the  internal  behavior  of 
other  DLLs.  Therefore,  a  single  N-gram  in  such  a  trace  often  refleets  the  behavior  of  multiple 
DLLs.  In  a  short  execution  of  Outlook,  up  to  six  DLLs  at  a  time  were  represented  on  the  eall 
stack.  Other  characterizations  of  the  behavior  of  the  applieation  as  a  whole  also  deseribe  the 
combined  behavior  of  many  shared  libraries. 

In  DLL  profiling,  we  eharacterize  each  module  (the  application  and  the  DLLs)  by  the  ealls  it 
makes  to  other  DLLs,  including  ntdll,  which  implements  the  kernel  API.  When  one  DLL  ealls 
another,  their  combined  state  can  be  represented  with  a  stack  of  traces  of  calls  between  modules, 
one  for  each  current  invoeation  of  a  module.  Figure  2  represents  a  snapshot  of  the  staek.  Each 
box  represents  a  separate  sequenee  that  is  currently  being  aeeumulated.  In  Figure  2,  the  most 
reeent  inter-module  call  by  the  application  is  to  function  f()  in  AAA,  which  in  turn  has  called 
function  c()  in  CCC.  When  function  c  returns,  the  current  inter-DLL  sequence  for  function  c() 
is  complete.  If  funetion  f()  calls  another  function  in  another  DLL,  a  sequenee  for  that  funetion 
is  pushed  onto  the  stack.  Since  DLLs  are  reentrant,  the  stack  may  include  multiple 
instantiations  of  a  single  module. 


(a) 


AAA.fO;  AAA.g();Kemel32.HeapAlloc();BBB.h();. . . 


(b) 


*  The  name  “Kemel32”  suggests  that  this  DLL  defines  an  interfaee  to  the  kernel.  Kemel32  provides  very  basie 
operating  system  funetionality,  but  it  aeeesses  the  kernel  only  through  ntdll,  whieh  implements  the  kernel  API.  In 
this  report,  we  will  eommonly  write  DLL  names  without  the  .dll  extension. 
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Figure  2.  (a)  The  stack  of  inter-DLL-call  sequences  in  a  thread.  Each  stack  element  is  a  trace  (b) 
of  calls  from  an  exported  function  of  a  DLL  (or  the  application  main)  to  other  DLLs. 

This  characterization  includes  much  the  same  information  as  the  VtPath  model  of  Feng  et  al. 
[Feng03].  Feng  et  al.  exploit  the  eall  staek  at  eaeh  system  eall  to  reeord  ealls  and  returns 
between  sueeessive  system  ealls.  Like  the  VtPath  model,  DLL  profiles  deteet  anomalies  above 
the  kernel-interfaee  level.  Our  model  differs  from  theirs  in  that  it  reeords  the  thread  history  per 
ealling  DLL,  rather  than  per  eall  to  ntdll.  In  addition,  our  model  is  much  sparser  beeause  it 
includes  only  ealls  between  modules.  At  any  one  time,  the  expeeted  number  of  functions  on  the 
DLL  staek  is  much  smaller  than  the  number  of  funetions  on  the  eall  staek,  beeause  funetions 
exported  by  a  DLL  are  gateways  to  the  DLL’s  entire  functionality,  much  of  which  may  be 
implemented  in  other  functions  inside  that  DLL.  The  exported  function  may  make  several  calls 
within  the  DLL  before  some  funetion  in  the  DLL  makes  a  eall  to  another  DLL. 

DLL  profiles  eonstitute  a  whole  elass  of  intrusion  deteetion  methods,  depending  on  what 
information  is  reeorded  in  the  traees  and  the  profile  for  eaeh  exported  DLL  function.  For 
example,  if  the  profile  foeuses  on  eontrol  flow,  training  traees  reeord  the  identity  of  the  ealled 
functions.  N-grams,  automata,  or  other  methods  may  be  used  to  represent  the  set  of  traces 
[Debar98,  Forrest96b,  Ghosh99,  Hofmeyr98,  MareeauOO,  Pfieger04,  Warrender99]. 
Alternatively,  if  the  profile  foeuses  on  dataflow,  the  training  traees  reeord  not  only  the  functions 
called,  but  also  relations  among  the  arguments  to  the  funetion  being  profiled  and  the  arguments 
of  the  functions  it  calls.  The  experiments  deseribed  in  this  report  used  N-grams,  with  N=6,^  but 
most  of  our  results  are  more  generally  applicable. 

An  IDS  that  uses  the  DLL  staek  model  for  intrusion  deteetion  can  be  realized  in  a 
straightforward  way.  We  posit  that  the  IDS  maintains  a  profile  of  each  function  exported  by  a 
Windows  system  DLL,  in  addition  to  a  profile  of  eaeh  applieation  module  (binary  or  DLL)  to  be 
proteeted.  At  run  time,  ealls  to  eaeh  profiled  DLL  are  eaptured,  for  example  by  mediating 
conneetors  [Balzer99a,  Balzer99b]  or  our  instrumentation,  and  sent  to  the  IDS.  For  each  thread, 
the  IDS  maintains  a  staek  of  currently  executing  modules  (DLLs  or  the  main  applieation).  For 
eaeh  funetion  in  the  staek,  it  reeords  information  about  the  external  ealls  made  by  the  funetion, 
as  in  Figure  2.  When  an  exported  funetion  of  a  DLL  is  ealled  from  another  DLL,  the 
instrumentation  informs  the  IDS  of  the  eall.  The  IDS  notes  the  eall  in  the  trace  at  the  top  of  the 
DLL  stack  for  that  thread,  cheeks  for  anomalies  against  the  profile  of  the  ealling  funetion,  and 
pushes  a  traee  for  the  ealled  funetion  onto  the  staek.  When  the  DLL  funetion  returns  normally, 
its  traee  is  popped  off  the  DLL  staek. 

After  an  update  to  a  DLL,  the  IDS  eontinues  to  funetion  but  switehes  to  training  mode  for  the 
updated  DLL.  When  an  exported  funetion  from  the  newly  updated  DLL  is  ealled,  the  IDS 
pushes  the  DLL  onto  the  staek,  but  instead  of  comparing  the  traee  of  the  DLL  funetion  to  the 
old  profile,  it  eollects  the  traee  for  input  into  a  new  profile.  When  the  DLL  funetion  returns,  the 
eompleted  traee  is  added  to  the  eolleetion  of  traees  for  that  funetion,  and  the  profile  ereation 
module  of  the  IDS  proeesses  it.  At  some  point,  the  profile  is  deemed  suffieiently  mature  to  be 
used  for  deteetion.  At  that  point,  the  IDS  switehes  back  into  detection  mode  for  that  DLL 


^  Although  Forrest’s  group  used  N=6  to  model  UNIX  and  Linux  processes,  a  smaller  value  for  N  may  be  more 
appropriate  for  tracking  behavior  in  terms  of  inter-DLL  calls. 
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function.  Note  that  function  profiles  mature  at  different  rates,  depending  on  how  frequently  the 
different  functions  are  exercised. 

3  Benefits  of  shared  library  (DLL)  profiles 

In  the  course  of  this  work,  we  have  identified  and  provided  evidence  for  five  important  benefits 
of  DLL  profiles.  Experimental  evidence  was  based  on  developing  a  profile  of  a  small 
application  (ImgViewer32)  and  using  it  to  exercise  exploits  against  two  recently  discovered 
vulnerabilities  in  Windows  DLLs:  the  gdiplus  vulnerability  and  the  WMF  vulnerability.  Results 
for  both  exploits  are  documented  in  earlier  reports  and  in  a  paper. 

Localization  of  anomalous  behavior  to  code  modules.  In  both  cases,  the  vulnerable  functions 
exhibited  clearly  anomalous  behavior.  That  is  not  surprising.  What  may  be  more  surprising  is 
that  during  both  exploits,  most  of  the  over  800  DLL  functions  exhibited  completely  normal 
behavior.  The  profiles  of  a  few  functions  did  not  converge  during  training,  so  they  were 
ignored  when  looking  for  anomalies.  Thus,  the  attacks  could  easily  be  associated  with  the 
vulnerable  modules.  This  helps  in  isolating  the  location  of  a  new  vulnerability  and 
understanding  which  other  applications  might  be  vulnerable  (those  that  use  the  vulnerable 
DLL). 

Reduction  of  false  negatives.  The  DLL  functions  we  have  profiled  typically  have  a  narrow 
range  of  behavior.  90%  of  all  traces  are  of  length  6  or  less,  and  half  call  just  one  other  function. 
This  dramatically  reduces  the  chances  of  a  false  negative,  since  it  is  unlikely  that  attack 
behavior  happens  to  fall  into  the  narrow  range  of  the  function’s  normal  behavior. 

Resistance  to  mimicry  attacks.  The  narrow  range  of  normal  behavior  reduces  the  probability 
of  false  negatives  and  makes  mimicry  attacks  infeasible  by  making  the  target  much  smaller:  2 
DLLs  instead  of  14  and  10  functions  instead  of  over  800.  Consider  the  gdiplus  exploit,  for 
example.  The  exploit  payload  in  our  experiment  created  a  new  user  through  calls  to  the  netapi32 
DLL.  A  clever  attacker  will  avoid  such  blatant  behavior,  but  will  find  himself  constrained  by 
the  normal  profile  of  the  vulnerable  function,  in  our  case  GdiplusShutdown.  A  mimicry 
attacker  has  to  find  a  function  that  is  not  only  vulnerable  but  also  includes  the  desired 
functionality.  This  is  much  harder  to  do.  For  example,  GdiplusShutdown  does  not  call  any 
function  that  creates  new  users,  writes  files,  or  sends  messages  to  another  computer. 

Anomaly  analysis.  Focalizing  anomalies  to  one  or  more  DFFs  makes  it  possible  to  draw  on 
knowledge  about  the  DFFs  to  analyze  anomalies.  Anomaly  analysis  in  real  time  helps  the  IDS 
decide  whether  to  treat  the  anomaly  as  novel  application  behavior  or  an  attack.  For  example, 
the  fact  that  the  anomalies  caused  by  the  WMF  exploit  were  in  code  that  had  been  stable  for 
fifteen  years  made  the  anomalies  much  more  suspicious  than  anomalies  in  a  function  for  which 
training  has  barely  completed. 

Anomaly  analysis  can  also  consider  the  distance  of  the  anomaly  from  the  profile.  For  example, 
the  gdiplus  exploit  called  two  functions  in  netapi32,  which  creates  new  users;  netapi32  does  not 
appear  in  the  profile.  In  addition,  anomaly  analysis  can  use  information  about  functions — such 
as  their  use  of  powerful  actions,  such  as  writing  to  files — to  estimate  their  potential  harm.  Other 
factors  of  interest  are  the  provenance  and  change  history  of  the  DLF. 
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Easing  the  burden  of  training  (and  retraining).  The  time  it  takes  to  ereate  a  profile  of  an 
application  depends  on  all  possible  combinations  of  behaviors  of  all  the  DLLs  the  application 
invokes.  In  a  DLL  profile,  variations  of  behavior  in  one  DLL  are  confined  to  the  profile  for  that 
DLL,  avoiding  a  combinatorial  explosion.  Further,  updates  to  Windows  system  DLLs  are 
frequent.  When  any  update  affecting  an  application  occurs,  a  new  round  of  retraining  is 
required.  Retraining  one  DLL  is  quicker,  and  allows  detection  of  anomalies  in  other  DLLs  to 
continue  while  the  updated  DLL  is  being  trained. 

The  rate  at  which  profiles  converge  is  apt  to  vary  from  one  DLL  function  to  another.  In  our 
experiments,  we  were  able  to  detect  anomalies  in  functions  whose  profiles  had  converged 
quickly  and  unambiguously,  while  ignoring  the  behavior  of  other  functions  (until  their  profiles 
converged). 

4  Performance  of  shared  library  (DLL)  profiles 

In  addition  to  the  work  reported  above,  we  validated  that  the  performance  of  DLL  profiles  is 
adequate.  To  do  this,  we  instrumented  Outlook  with  wrappers,  using  techniques  described  in 
Section  5.  Using  the  Kernrate  Performance  monitoring  tool  [KrView04],  we  measured  the 
amount  of  time  Outlook  spent  in  both  user  and  privileged  modes  for  four  configurations: 
Outlook  alone.  In  this  configuration,  no  DLL  wrappers  were  used. 

Wrappers  but  no  payload.  In  this  configuration.  Outlook  DLLs  were  wrapped,  but 
the  wrappers  did  not  do  anything. 

Wrappers  with  simple  lookup  function.  This  configuration  modeled  inexpensive 
intrusion  detection — a  table  lookup  of  each  call  made  by  the  function  to  another  DLL. 

This  simple  lookup  would  be  appropriate  for  a  DLL  function  that  calls  only  a  small 
number  of  functions  in  other  DLLs.  It  simply  ensures  that  no  other  functions  are 
called. 

Wrappers  used  in  training.  This  configuration  writes  information  about  the  call  to  a 
log  and  is  analogous  to  what  would  be  used  in  training. 

We  then  computed  the  performance  penalty  for  different  combinations  of  detection  functions, 
including 

•  The  simple  lookup  function.  In  our  experiments,  90%  of  all  DLL  functions  made  6  or 
fewer  inter-DLL  calls.  We  let  the  percentage  of  simple  lookup  functions  vary  from  75% 
to  90%. 

•  A  training  function — for  this  we  used  our  current  training  function,  which  writes  to  a  log 
file.  We  surmise  that  at  any  one  time,  training  will  occur  for  only  a  few  functions. 
However,  we  computed  the  effect  of  training  percentages  between  1  and  16% 

•  A  more  complex  detection  function,  whose  execution  time  is  halfway  between  simple 
lookup  and  training.  We  assumed  that  all  DLL  functions  that  were  not  using  simple 
lookup  and  were  not  in  training  were  using  this  (otherwise  unspecified)  detection 
function. 
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The  results  of  the  computation  are  shown  in  Figure  3.  In  every  case,  the  performance  penalty  is 
under  5%,  which  should  be  acceptable  for  most  applications. 


Figure  3.  Performance  penalty  for  a  mix  of  IDS  techniques  and  percentage  of  DLLs  requiring 

training 


5  Instrumentation 

DLL  profiling  is  of  particular  interest  in  Windows  systems,  which  are  extremely  common. 

Since  we  have  experience  in  instrumenting  Windows  systems,  it  seemed  straightforward  to 
apply  our  techniques  to  collecting  data  on  inter-executable  calls.  In  this  section,  we  will 
describe  our  previous  experience  with  instrumenting  Windows,  problems  we  encountered  in  our 
initial  attempts  to  collect  data  on  inter-DLL  calls,  and  the  approach  that  finally  brought  results. 

In  another  project  at  ATC-NY,  ntdll.dll  functions  were  wrapped  so  that  calls  to  the  kernel  could 
be  trapped.  In  this  project,  the  wrapper  technology  was  further  developed  to  map  the  locations 
of  DLLs  in  memory,  trace  the  program  stack  from  the  call  to  ntdll.dll  up  to  the  call  from  the 
main  binary,  and  to  note  the  transitions  from  one  DLL  to  another.  This  technique  made  it 
possible  to  track  calls  between  executables;  the  technology  was  also  used  in  a  successful 
DARPA-funded  effort  [Marceau05]  to  profile  resource  use  by  applications. 

However,  stack  tracing  has  limitations.  For  example,  many  programs  are  compiled  with  stack 
frame  pointer  optimization,  which  means  that  they  short-circuit  the  normal  calling  and  register 
use  conventions,  making  it  impossible  to  identify  return  pointers  on  the  stack.  The  basic 
problem  is  the  lack  of  a  standard  stack  discipline  that  makes  it  possible  to  reliably  trace  back 
from  callee  to  caller  on  the  stack.  In  order  to  make  the  instrumentation  more  robust,  another 
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solution  was  required.  As  an  alternative,  we  investigated  Mierosoft  Detours  [Detours05],  a 
library  for  instrumenting  Win32  funetions  by  re-writing  the  target  funetion  images,  and 
Teknowledge’s  mediating  eonneetor  wrappers  [Balzer99].  However,  Detours  requires 
identifying  every  funetion  that  is  to  be  instrumented  and  manually  writing  “trampoline” 
instrumentation  eode  for  eaeh  one,  and  Balzer’s  work  eurrently  has  the  same  limitation. 

We  have  therefore  developed  an  alternative  instrumentation  approaeh,  whieh  we  eall  “easeading 
wrappers.”  In  this  approaeh,  whenever  a  DLL  is  loaded  by  the  applieation,  the  load  is 
intereepted  and  the  DLL  wrapped  dynamieally.  (DLL  loads  are  performed  by  the  ntdll.dll 
funetion  LdrLoadDll.)  As  a  result,  all  DLLs  ultimately  invoked  by  the  applieation  are  wrapped 
and  instrumented,  and  invoeations  of  funetions  within  them  are  logged.  To  eateh  invoeations  to 
other  DLLs  through  a  funetion  pointer,  we  trap  the  ealler’s  request  for  the  funetion  pointer  and 
return  a  pointer  to  the  (wrapped)  DLL  funetion. 

We  are  not  aware  of  previous  efforts  to  do  automatic  dynamic  wrapping.  Both  Teknowledge’s 
wrappers  and  Microsoft’s  Detours  product  require  prior  knowledge  of  the  API  and  custom 
programming  to  capture  calls.  Because  it  is  automatic,  our  technique  cannot  exploit  knowledge 
of  the  API  or  of  argument  values;  we  are,  however,  are  able  to  log  the  fact  that  a  call  has  been 
made. 

Both  our  wrappers  and  Teknowledge  wrappers  have  “blind  spots” — that  is,  neither  can  detect  all 
calls  made  between  Windows  DLLs,  although  both  can  detect  most  calls.  Teknowledge 
wrappers  are  defined  on  exported  functions,  while  ours  are  based  on  imported  functions. 
However,  two  types  of  function  evade  detection  by  either  method,  because  they  invoke 
functions  that  do  not  appear  in  the  DLL’s  export  table; 

•  If  a  DLL  implements  a  C++  class,  a  non-exported  function  may  be  called  through  a  C++ 
class  table.  These  calls  are  invisible  to  both  Teknowledge  wrappers  and  ours. 

•  In  certain  other  cases,  the  caller  has  independent  knowledge  of  the  structure  of  the  called 
DLL  and  is  able  to  locate  a  function  in  the  DLL  without  using  the  export  table. 

We  believe  that  additional  research  on  DLL  instrumentation  would  be  of  great  benefit  to 
advancing  the  state  of  the  art  in  host-based  intrusion  detection. 

6  Publications 

C.  Marceau  and  M.  Stillerman,  "Modular  Behavior  Profiles  in  Systems  with  Shared  Libraries," 
submitted  to  the  Eighth  International  Conference  on  Information  and  Communications  Security 
(ICICS  '06),  2006. 

7  Conclusions 

In  the  past  three  years,  we  have  implemented  and  tested  a  novel  data  stream  for  host-based 
anomaly  detection  that  helps  to  distinguish  between  novel  behavior  and  novel  attacks.  This  data 
stream  results  in  a  closer  approximation  to  actual  program  behavior  than  has  hitherto  been 
available  and  makes  it  possible  to  reduce  both  false  positives  and  false  negatives,  discourage 
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mimicry  attacks,  reduce  the  burden  of  training,  and  provide  information  for  anomaly  analysis. 
DLL  profiles  lead  to  a  family  of  intrusion  deteetion  methods  that  all  enjoy  these  advantages. 
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